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Abstract 


This paper considers the effects of different types of faults on a proton exchange membrane fuel cell model (PEMFC). Using databases (which 
record the fault effects) and probabilistic methods (such as the Bayesian-Score and Markov Chain Monte Carlo), a graphical—probabilistic structure 
for fault diagnosis is constructed. The graphical model defines the cause-effect relationship among the variables, and the probabilistic method 
captures the numerical dependence among these variables. Finally, the Bayesian network (i.e. the graphical—probabilistic structure) is used to 
execute the diagnosis of fault causes in the PEMFC model based on the effects observed. 


© 2006 Elsevier B.V. All rights reserved. 
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1. Introduction 


Environmental issues have increased the demand for less pol- 
luting energy generation technologies. Governmental actions 
to support a hydrogen-based economy are under way, as well. 
Most recent developments in proton exchange membrane fuel 
cell (PEMFC) technology have made them commercially avail- 
able for stationary and mobile applications in the range of up to 
200 kW. 

Fuel cells (FCs) convert the energy contained in hydrogen 
directly into electricity with only water and heat as the prod- 
ucts of the reaction. Under certain pressure, hydrogen (H2) is 
supplied into a porous conductive electrode (the anode). The 
Hp spreads through the electrode until it reaches the catalytic 
layer of the anode, where it reacts to form protons and electrons. 
The H* ions (or protons) flow through the electrolyte (a solid 
membrane), and the electrons pass through an external electri- 
cal circuit, producing electrical energy. On the other side of the 
cell, the oxygen (O2) spreads through the cathode and reaches 
its catalytic layer. On this layer, the O2, H* protons, and elec- 
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trons produce liquid water and residual heat as sub-products 
[5]. 

Several papers have been published considering FC opera- 
tion in normal conditions; but only few of them addressed the 
FC operation under fault analysis. Faults are events that cannot 
be ignored in any real machines, and their consideration is essen- 
tial for improving the operability, flexibility, and autonomy of 
commercial equipment. 

In this paper, Bayesian network algorithms are applied for 
the construction of a graphical—probabilistic structure to fault 
diagnosis in PEMFCs. 

This paper is organized as follows. In Section 2, the basic con- 
cepts for the mathematical model of a PEMFC are introduced. 
In Section 3, four types of faults in PEMFC are considered: 
faults in the air fan; faults in the refrigeration system; growth 
of the fuel crossover; and faults in the hydrogen pressure. Sec- 
tion 4 introduces a short background of Bayesian networks and 
learning algorithms to apply on fault diagnosis of PEMFC. 


2. The fuel cell model 


A mathematical model of a fuel cell (FC) was used to study the 
possible fault effects. This model consists of an electro-chemical 
and a thermo-dynamical sub-model. 
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2.1. The electrochemical model 


The output voltage Vgc of a single cell can be defined as the 
result of the following expression [11]: 


Vec = Enemst — Vact — Vohmic — Veon (1) 


ENernst 18 the thermodynamic potential of the cell representing 
its reversible voltage: 


ENermst = 1.229 — 0.85 x 107°(T — 298.15) 
1 
+4.31 x 10-°T |In(Px,) + 5 In(Po,) (2) 


where: Py, and Po, (atm) are the hydrogen and oxygen pres- 
sures, respectively, and T (K) is the operating temperature. 

Vact is the voltage drop due to the activation of the anode and 
the cathode: 


Vact = —[61 + é2T + &3T In(co,) + €4T ln(rc)] (3) 


where &; (i= 1—4) are specific coefficients for every type of FC, 
Ipc (A) is the electrical current, and co, (atm) is the oxygen 
concentration. 

Vohmic is the ohmic voltage drop associated with the conduc- 
tion of protons through the solid electrolyte, and of electrons 
through the internal electronic resistance: 


Vohmic = Jpc(Rm + Rc) (4) 


where Rc (Q) is the contact resistance to electron flow, and RM 
(Q) is the resistance to proton transfer through the membrane: 


£ 

RM = ~ i 

r 181.6[1 + 0.03(Ifc/A) + 0.062(T/303)? (Ipc /A)5] 
M = 


[y — 0.634 — 3(Ifc/A)] exp[4.18(T — 303/T)] 
(5) 


where pm (Q cm) is the specific resistivity of membrane, £ (cm) 
is the thickness of membrane, A (cm) is the active area of the 
membrane, and wy is a coefficient for every type of membrane. 
Von represents the voltage drop resulting from the mass trans- 
portation effects, which affects the concentration of the reacting 


gases: 
J 
(6) 
Jmax ) 


where B (V) is a constant depending on the type of FC, Jmax is 
the maximum electrical current density, and J is the electrical 
current density produced by the cell. In general, J=Jour+Jn 
where Jout is the real electrical output current density, and Jn is 
the fuel crossover and internal loss current. 

Considering a stack composed by several FCs, and as first 
order analysis, the output voltage is Vstack = nrVpc, where nr is 
the number of cells composing the stack. However constructive 
characteristic of the stack such as flow distribution and heat 
transfer should be taken [1,10,19]. 

In this paper, a mathematical model for a 500 W stack (man- 
ufactured by BCS Technologies) is used. The parameters for 


Veon = —Bln (1 — 


Table 1 

Parameters of a PEMFC BCS, 500 W 
Parameter Value 

nr 32 

A 64 cm? 

£ 178 pm 

Po, 0.2095 atm 
Puy l atm 

Rc 0.003 Q 

B 0.016 V 

é —0.948 

& 0.00286 + 0.0002 In A + (4.3 x 1075) In CH 
& 7.6 x 1075 

&4 —1.93 x 1074 
y 23.0 

Ja 3 mA cm? 
Jmax 0.469 A cm? 


this particular model are presented in Table 1. In [6] the polar- 
ization curve obtained with this model is compared to the 
polarization curve of the manufacturing data sheet to validate the 
model. 

In general, these parameters are based on manufacturing data 
and laboratory experiments, and their accuracy can affect the 
simulation results. In [5], a multi-parametric sensitivity anal- 
ysis is performed to define the importance of the accuracy of 
each parameter. Basically, the parameters are classified in three 
groups: insensitive (A, Rc, £), sensitive (Jn, B, Y, &4), and highly 
sensitive parameters (Jax, &3, §1). The accuracy was analyzed in 
normal conditions, considering variations around + 10% of their 
normal values. However, in fault conditions, those variations can 
be stronger, as presented in Sections 3.1—3.4. 


2.2. The thermo-dynamical model 


The calculation of the relative humidity and the operating 
temperature of the FC essentially compose the thermo- 
dynamical model [7]. 


2.2.1. Temperature 
The variation of temperature is obtained with the following 
differential equation: 


dT AQ 
dt MC; 


(7) 


where M (kg) is the whole stack mass, Cs (J | a kg7!) is the 
average specific heat coefficient of the stack, AQ (J s7!) is the 
rate of heat variation (i.e. the difference between the rate of 
heat generated by the cell operation (Ogen) and the rate of heat 
removed). Heat can be removed by the air flowing inside the 
stack (Onin); by the refrigeration system (Orem), by water 
evaporation (Orem ), and by heat exchanged with the surround- 
ings (Orem, ). 

In this FC system, the refrigeration system is turned on when 
the operating temperature is higher than 50°C. 
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2.2.2. Relative humidity 

A correct level of humidity should be maintained in the FC. 
This level is measured through the relative humidity HR. The 
relative humidity HRoy; of the output air is calculated from the 
equation: 
HRout = Pwin + Povgen (8) 

P. sat_out 

where Pw;,, is the partial pressure of the water in the inlet air; 
PWen 18 the partial pressure of the water generated by the chem- 
ical reaction [11]; Psat.out is the saturated vapor pressure in the 
output air. Considering that HRout  Psat_out = Pw out, Eq. (8) 
establishes the balance of water: output = input + internal gen- 
eration. 

The Psat is calculated from the equation: 


(b/T +c) 
P10 
If T> 273.15 (°K), then a= —4.9283; b = —6763.28; c=54.22; 

If the HR is much smaller than 100%, then the membrane 
dries out and the conductivity decreases. On the other hand, a 
relative humidity greater than 100% produces accumulation of 
liquid water on the electrodes, which can become flooded and 
block the pores; this makes gas diffusion difficult. The result of 
these two conditions is a fairly narrow range of normal operat- 
ing conditions. In conclusion the ideal operational condition is 
HR= 100%. In this equipment, the control system adjusts the 
air-reaction volume to maintain the HR close to 100%. In [16] 
this control technique has been implemented. 

In abnormal conditions some parameters change, i.e. flooding 
and drying condition affects Rc and Rm, respectively. Also in 
[9] the variation of the resistances had been associated with fault 
detection of flooding and drying. 

Fig. 1 (adapted from [11]) illustrates the variation of tem- 
perature and relative humidity for different stoichiometry air 
relationships (A =2, 4). The stoichiometry À is the relationship 
between inlet air divided by the air necessary for the chemical 


Psat = T% ex 
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Fig. 1. Temperature and relative humidity for A =2, 4 (adapted from [11]). 
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Fig. 2. Variation of output HR vs. input HR. 


reaction. In general, the maximum efficiency occurs at about 
80% of fuel utilization (H2) and 50% of oxygen utilization. 
Therefore, for a good concentration of O2 in the air through 
the entire FC, A should be bigger than 2 [11]. 

To prevent the membrane from drying, some researchers (e.g. 
[11]) have proposed extra humidification on the input air. How- 
ever, the variation in the HR of the input air produces a very small 
adjustment in the output HR; for example, a variation of 10% 
on the input HR represents a variation of approximately 2% on 
the output HR. Thus, in many cases, the extra humidification of 
the input air is not enough to resolve the drying problem. Fig. 2 
illustrates the variation produced on the HR of output air by the 
adjustment in the HR of input air. 


2.3. Normal operation of a fuel cell 


Fig. 3 illustrates the evolution of a few PEMFC variables in 
normal operating conditions as a function of time. The variables 
are: voltagestack (V), electrical current Ipc (A), temperature (°C), 
volume of air flow (Ls~!), and stoichiometry air relationship 
à. In this test, the FC supports a constant-load demand; thus, 
the voltage and current should vary by themselves to maintain 
this demand (i.e. the output power would be constant). Also the 
control system adjusts the air-reaction volume to maintain the 
HR close to 100%. 

The simulation begins with the FC system in stand-by (i.e. 
without load, and at environmental temperature, approximately 
25°C). After the load requirement, the electrical equilibrium 
is reached in less than 3s (e.g. the equilibrium of voltage and 
current). On the other hand, the temperature begins to increase 
until, at t=10 min, it reaches 50°C. Then, the refrigeration 
system is turned on. The temperature increases slowly until 
the thermo-dynamical steady state is reached after t=40 min. 
Note that variations on the temperature have influenced the 
evolution in the airflow and à. Also, variations in voltage and 
current are performed, especially in the first 10 min, but they 
are produced by a slower evolution of the thermo-dynamical 
state. 


270 


L.A.M. Riascos et al. / Journal of Power Sources 165 (2007) 267-278 


40 
20 
25 
a 30 15 
= = 20 S 
v = 2 10 
S 2 ” 
S 20 = 
15 5 
10 10 (0) 
O 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 
time (min) time (min) time (min) 
60 5 
4 
g 5 & 
w =l 
S 23 
5 40 Š 
2 nh 2 
= 
Oo 
30 1 
0 
O 10 20 30 40 50 6&0 0 10 20 30 40 50 60 
time (min) time (min) 


Fig. 3. Evolution of variables of a FC in normal conditions deriving from a mathematical model in MATLAB®. 


. Faults in fuel cells 


In general, two types of fault detection can be considered: 


Faults that can be detected by monitoring a specific variable. 
For example, the leak of fuel can be detected by installing a 
specific gas sensor. In this case, a diagnosis is not necessary. 
Faults that cannot be detected directly by monitoring or faults 
that need some type of diagnosis. 


Usually, fault detection on commercial fuel cell equipment is 
limited to detection of faults of the first type. This work focuses 
on fault detection of the second type. 

Four types of faults in PEMFCs are considered in this study: 
(1) fault in the air fan, (2) fault in the refrigeration system, (3) 
growth of the fuel crossover, and (4) fault in the hydrogen pres- 
sure. The effects of these faults are included in the mathematical 
model to analyze the behavior of the FC system in fault operation 
conditions. 


3.1. Fault in the air reaction fan 


A reduction of the reaction air by a fault in the air fan can 
produce two major effects: (1) accumulation of liquid water than 
cannot be evaporated and (2) reduction of O2 volume below that 
necessary for a complete reaction with the H2. 

A common method for removing excess water inside the FC 
is using the air flowing through it. The correct variation of the 
stoichiometry à maintains the HR proximal to 100%. However, 
when a fault in the air fan takes place, this becomes impossible. 
This fault reduces the air reaction flow, which reduces the water 


evaporation volume and permits the accumulation of water. A 
great accumulation of water causes the flooding of electrodes 
making gas diffusion difficult and affecting the performance 
of the FC. These effects are simulated by Eq. (9), which was 
obtained empirically. 


Jmax(0) 


(Wacum(k)/Const} 


Wacum(k) 


const 


0.8 
Reg = Reo): ( ) » Imax(k) = 


aes 


where Jmax(0) is the value of the maximum electrical current 
density at the initial state (normal condition), Rc(o) is the value 
of the variable at the initial state (normal condition), Wacumck) 
is the volume of water accumulated at instant k, and const, is a 
constant defining when the electrodes are led to flooding. 

The second effect of a fault in the air fan occurs when À 
is below the practical and recommended value. In this case, 
the O2 concentration is reduced and the exit air completely 
depleted of O2. This reduction of O2 concentration produces a 
negative effect on the ENernst (Eq. (2)) and increase on the Vact 
(Eq. (3)). In this case, the O2 concentration changes according 
to empirical Eq. (10): 


fice CO2) 
Pe 4/ const? [Ao 


where co,,) is the O2 concentration at instant k, COz is the nor- 
mal O% concentration in the air, and const; is a constant defining 
when A is lower than necessary for the chemical reaction. 

Fig. 4 illustrates the evolution of a few variables when a fault 
in the air fan is considered. The variables are voltagestack (V), 
Irc (A), temperature (°C), air flow (L s7!), A, and accumulated 
water (L). In this case, the starting point of the simulation 


(10) 
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Fig. 4. Evolution of variables by fault in the reaction air fan. 


(t=0) is an FC on thermal steady state. The fault in the air 
fan takes place at t=30 min. The initial effect is the variation 
of the airflow volume, which reduces à close to 1, affecting 
voltage and Ipc. Also, this fault produces accumulation of 
liquid water and, at t=45 min, the accumulation of water is 
enough to produce variation on the resistance of electrodes 
affecting voltage and Ipc continuously. 


3.2. Fault in the refrigeration system 


The refrigeration system maintains temperature within the 
normal operating conditions. When the temperature increases, 
the reaction air has a drying effect and reduces the relative 
humidity HR. A low HR can produce a catastrophic effect on 
the polymer electrolyte membrane, which not only totally relies 
upon high water content, but is also very thin (and thus prone to 
rapid drying out). The drying of the membrane changes the resis- 
tance of membrane to proton flow (Rm, Eq. (4)). Rm is affected 
by the adjustment of y (Eq. (5)), which varies according to 
empirical equation (11): 


Wo) 
(const3/HRouk)) 1 


Ve = (11) 
where, const3 defines when the membrane is led to drying. 

The variation of Ry produces an increase in the ohmic voltage 
drop Vohmic, equation (4), and it produces the reduction of Vgc, 
Eq. (1). 

Fig. 5 illustrates the evolution of the variables voltagegtack 
(V), Irc (A), temperature (°C), air flow (L s7!), A, and heat 
removed by the refrigeration system Orem (W), when a total 
fault in the refrigeration system is considered at t= 30 min. 

The initial fault effect (at t=30min.) is the increase in 
temperature. Then, the FC controller automatically reduces i, 


maintaining the performance of the FC. However, when à =2 
(i.e. the minimum value recommended) is not further reduced, 
and then the drying effect has a continuous influence on the 
voltagestack, Zrc, ait flow and other variables. 


3.3. Increase of fuel crossover (Jn) 


There is a small amount of wasted fuel that migrates through 
the membrane. It is defined as fuel crossover—some hydrogen 
will diffuse from the anode (through the electrolyte) to the cath- 
ode, react directly with the oxygen, and produce no current for 
the FC. 

In normal conditions, the flow of fuel through the membrane 
(Jn) is very small, typically representing only a few mA cm?. A 
sudden increase in this variable can be associated with rupture 
of membrane. 

This variation of Jn produces an increase in the concentration 
voltage drop (Veon Eq. (6)), and therefore a reduction of Vgc, 
Eq. (1). 

Fig. 6 illustrates the evolution of the variables voltagestack 
(V), Irc (A), temperature (°C), air flow (L s7!), and A, when a 
sudden variation of Jn is performed from 0.003 to 0.1 A cm? at 
t=30 min. 

The initial effect is a variation on all the variables including 
the power produced by the FC. The FC controller automati- 
cally adjusts the stoichiometry (by reducing the airflow) until, at 
t=47 min, it reaches A =2, and then cannot be further reduced. 
This affects the output HR and other variables. 


3.4. Fault in the hydrogen feed line 


In general, for mobile and stationary applications, the hydro- 
gen is supplied from a high-pressure bottle and reduced by a 
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Fig. 5. Evolution of variables by fault in the refrigeration system. 


pressure controller. In normal conditions, the hydrogen pressure 
is assumed to be constant (1 atm). Variation in the hydrogen 
pressure affects the performance of the FC. A lower pressure 
negatively affects the performance of the FC. The reduction 
of Hz pressure reduces the density of current J affecting Ipc, 
decreases ENernst equation (2), increases Vact equation (3), and 
has a corresponding effect on Vgc, equation (1). 

Fig. 7 illustrates the evolution of the variables voltagestack 
(V), Ipc (A), temperature (°C), air flow (L s7!), A, and Hy pres- 


sure (atm), when a reduction on the H2 pressure is considered 
from 1 to 0.2 atm at t= 30 min. 

Fault in the oxygen feed line (such as a fault produced by 
blocking the air filter), can be an interesting issue in a fault- 
tolerant FC system. In practical applications, the oxygen is 
supplied from the air where it has a constant pressure. There- 
fore, a fault in the air reaction feed line does not produce a 
variation in the air (or oxygen) pressure; instead, a reduction 
on the O2 concentration can be produced. However, the effects 
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Fig. 7. Evolution of variables by reduction in H32 pressure. 


of this fault are similar to a fault in the air reaction fan (see 
Section 3.1). 

In Section 3, the effects of four types of faults on the FC 
operation were explained simply and directly. But, when a fault 
happens, an interconnected dependence among the variables is 
performed. That makes diagnosis of the fault cause difficult. 
Figs. 4-7 illustrate this dependence where, in all cases of faults, 
all the variables have performed changes. 

By implementing those types of faults, and their effects, on 
the mathematical model of the FC, databases for recording the 
evolution of variables in fault conditions can be constructed. 
Then, probabilistic approaches can be applied on the databases 
to qualify and quantify the dependency relationship among the 
variables. In the next section, Bayesian networks are considered 
for the construction of a graphical—probabilistic structure based 
on databases. 


4. Bayesian networks for fault diagnosis 


Bayesian networks have been extensively applied to fault 
diagnosis, e.g. [12] and [3]; however, in the area of fuel cells, 
it is a new field. In [12], a Bayesian network is implemented 
for controlling an unsupervised fault tolerant system to generate 
oxygen from the CO2 on Mars atmosphere. In [3], Bayesian net- 
work is applied for fault diagnosis in a power delivery system. 
One advantage of Bayesian network is that it allows the com- 
bination of expert knowledge of the process and probabilistic 
theory for the construction of a diagnostic procedure; neverthe- 
less, both are recommended for the construction of a “good” 
Bayesian network. 

A Bayesian network is a structure that graphically mod- 
els relationships of probabilistic dependence within a group 
of variables. A Bayesian network B=(G,CP) is composed of 


the network structure and the conditional probabilities (CP). A 
directed acyclic graph (DAG) represents the graphical structure 
G, where each node of the graph is associated to a variable Xj, 
and each node has a set of parents pa(X;). The conditional prob- 
abilities CP, numerically capture the probabilistic dependence 
among the variables [2]. 

The construction of a graph to describe a diagnostic process 
can be executed in two ways: 


Based on human knowledge about the process, where relation- 
ships among variables are established to define the criteria for 
choosing the next state (i.e. the relationship between variables 
and parents); 

e Based on probabilistic methods using databases of records. 


The construction of a Bayesian structure G based on 
knowledge can be relatively simple; but its efficacy depends 
completely on the human expert knowledge about that domain. 

The implementation of probabilistic methods for the struc- 
ture learning can follow two approaches: constraint-based and 
search-and-score. In the constraint-based approach, the starting 
point is an initially given graph G. And then, edges are removed 
or added if certain conditional independencies are measured in 
the database. In the search-and-score approach, a search through 
the space of possible DAGs is performed for finding for the best 
DAG. In this research, the Bayesian-score (K2) [4] and Markov 
chain Monte Carlo (MCMC) [14] algorithms are applied. The 
K2 and MCMC algorithms are relatively easy to be applied 
on an automatic generation of the graph, and they are already 
implemented in the MatLab BNToolbox [13]. 

The number of DAGs, as a function of the number of nodes 
(f(n)), grows exponentially with n. According to [4], a recursive 
function can be used to know the number of DAGs as function 
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of the number of variables: 
n 
fa) = I (Z) efa- (12) 
i 
i=l] 


For example, a model with 16 variables (n=16) has 
8.38 x 1046 possible DAGs. Thus, an exhaustive search on the 
space of all DAGs is not practical. Therefore, a local (e.g. K2) 
or a stochastic (e.g. MCMC) search should be made. 

In this work, the construction of a Bayesian network for 
fault diagnosis begins with the generation of a graph applying 
probabilistic methods and, after that, refined using constrains 
and domain knowledge. The complete sequence consists of the 
following steps: 


1. Construction of the database—the records are provided 
from a mathematical model of a PEMFC implemented on 
MatLab®. Field experiments could also provide those records 
as considered in [16]; however, two major problems are 
pointed out: (a) a large amount of data is necessary where 
the generation of each case takes around 2h of supervised 
experiments, and (b) variables such as Qgen, flooding, A, etc, 
impose additional challenges to be monitored. 

2. Implementation of search-and-score algorithms (K2 and 
MCMC) to find the initial structure. The probabilistic 
approaches were implemented using the BNT (Bayesian Net- 
work Toolbox) developed for MatLab® [13]. 

3. Constraint-based conditions and knowledge are applied for 
improving the structure. 

4. Calculation of conditional probabilities. The conditional 
probabilities are calculated on the resulting structure. 


4.1. Database generation 


In this research, the diagnosis is executed at a specific 
moment, only if abnormal evolution of any variable is mon- 
itored; the idea is to associate this evolution with symptoms 
of incipient faults. Then binary states of the variables are gen- 
erated (0=normal, 1 = abnormal). The general procedure is to 
monitor a specific variable; if after a fault takes place and 
the value of such variable is off a certain tolerance band, 
then a flag should be turned to “1”. Fig. 8 represents the 
range of tolerance of the Jpc and the evolution after a fault at 
t=30 min. 

The next step is the construction of a vector containing the 
value of all variables. This vector corresponds to a single case 
in the database with values of all variables in a certain period. 

From the mathematical model, the evolution of variables that 
can be difficult to monitor on areal machine (such as Qgen or HR) 
can be observed. Records of all variables are essential for the 
construction of the network structure avoiding hidden variables. 

The variables considered are the following: 


Jn = fault by fuel crossover 

aF = fault in the air fan 

rF = fault in the refrigeration system 
H) = fault by low Hz pressure 


range of 4 
tolerance 


20 


before Pa 


the fault 


lec (A) 


Tsa is | 


the fault 


(0) 10 20 30 40 50 60 
time (min) 


Fig. 8. Evolution of Ipc by a fault at t= 30 min. 


Flow = volume of air flow 

Qgen = generated heat 

LL = stoichiometry air relationship À 

HRout = output relative humidity 

Drying = drying of membrane 

Flood = flooding of electrodes 

Ov = overload (i.e. the FC is working close to the maximum 
load; in those cases, same variables can perform a different 
evolution) 

Volt = voltagestack 

I= Ipc electrical current of the FC 

T= temperature 

Power = difference between real output power and required 
load 

pH? = H2 pressure 


A database with 10,000 cases was constructed for the struc- 
ture learning of a Bayesian network for fault diagnosis in fuel 
cells. The database considers different operational conditions 
with different fault causes simulated and, selected in a random 
sequence. 

A vector for fault diagnosis in FC has the structure presented 
in Fig. 9. 


4.2. The Bayesian-score (K2) algorithm 


The K2 algorithm [4] is a very useful search algorithm. Ini- 
tially, each node has no parents. It then incrementally adds those 
parents, the addition of which increases the score of the result- 
ing structure even more. When the addition of no single parent 
increases the score, it stops adding parents to the node. Before 
the algorithm begins, the possible parents of every variable must 
be defined. Therefore, the human-expert experience is important 
to define that order. If the order is known, a search over this order 
is more efficient than searching over all DAGs. The K2 algorithm 
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Fig. 9. Generation of a vector for the construction of the database. 


maximizes the next function: 


n qi i ri 
P(GID) = PE yna pile (13) 


i=l j=1 


where Nj is the number of occurrences of {X;=xjj| 
pa(X;)=77x}, r is the number of values of X;, q is the num- 
ber of values of pa(X;), and n is the number of variables. x;; and 
Tik are specific values of the variable X; and pa(X;). P(G|D) is 
the score of the DAG G to represent the database D. 

Fig. 10 illustrates the resulting network structure applying 
the K2 algorithm. The order of the variables follows: 

Jn=l, aF=2, rF=3, H2=4, Flow=5, Qgen=6, LL=7, 
Flood=8, Drying=9, HRou=10, Ov=11, Volt=12, 7=13, 
T= 14, Power= 15, pH2 = 16. 

For example, according to Fig. 10, a probabilistic dependence 
between variable | (as parent) and variables {2, 3, 4, 6, 9 and 
11} (as children) is established from the database. 


Fig. 10. Bayesian network structure implementing the K2 algorithm. 


4.3. The MCMC (Markov Chain Monte Carlo) algorithm 


The MCMC algorithm is composed of a Markov Chain and 
a Monte Carlo process. A Markov Chain is a stochastic process, 
where the current state depends only on the past state. Applying 
a Markov Chain in Bayesian networks, the chain is the sequence 
of DAGs in which the search for the best DAG is performed. 

A Monte Carlo is a probabilistic approximation for a very 
complex, or unknown function. The Monte Carlo process finds 
a very complex function (i.e. the DAG) that best agreed with the 
evidence contained in the database by applying a probabilistic 
approximation. 

The MCMC algorithm starts at a specific point in the space 
of DAGs. The search is performed through all the nearest neigh- 
bors, and it moves to the neighbor that has the highest score. If 
no neighbor has a higher score than the current point, a local 
maximum has been found and the algorithm stops. A neighbor 
is the graph that can be generated from the current graph by 
adding, deleting or reversing a single arc. 

Fig. 11 illustrates the resulting network structure applying 
the MCMC algorithm where the variable order is the same as in 
Fig. 10. 


Fig. 11. Bayesian network structure applying the MCMC algorithm. 
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4.4. Improving the network structure 


In practice, the search-and-score algorithms are not exact, 
and used only as initial approximations. Also, since the K2 and 
MCMC algorithms applied different tradeoffs for searching the 
structure, those algorithms can produce different results. But 
both structures can be considered on the resulting network struc- 
ture. To improve the network structure, the following steps are 
executed: 


e fusion of the results applying the K2 and MCMC algorithms; 
e groups of variables are arranged in layers; 
e constraint-based conditions and knowledge are applied. 


The fusion of the results applying K2 and MCMC basically 
confirms the edges present in both structures (Figs. 10 and 11) 
and submits the remaining edges to erasing based on constrains 
and domain knowledge. 

For a better understanding of the relationship among vari- 
ables, those are separated in several layers. In this structure, 
three layers are considered: fault causes, pattern recognition, 
and sensors. Fault causes are the possible causes of faults such 
as faults in the air fan (aF), faults in the refrigeration system 
(rF), growth of Jn, and low H? pressure (see Sections 3.1-3.4). 
Sensors are variables that can easily be monitored using sensors 
(such as output voltagestack V, electrical current Ipc, tempera- 
ture T, power, and H3 pressure). Pattern recognition is associated 
with variables difficult to monitor in a real machine, but that play 
an important role in a cause-effect structure and define a fault 
pattern. 

Some of the constraints to be considered are: (1) indepen- 
dent fault cause assumption, i.e. only one fault takes place each 
time, and one fault cause does not influence other fault cause; 
(2) independent sensors—edges among sensors can be erased 
because their values are always observed. 

After that, domain knowledge is applied; basically, the sub- 
mitted edges are compared with the relationship among variables 
in the process. For example, an edge from variable 7 (stoichiom- 
etry) to variable 16 (pH2) appears in Fig. 11 (applying MCMC) 
but not in Fig. 10 (applying K2), then this is one of the edges 
submitted to be erased. According to Fig. 7, a variation of the 
pH2 does not have a significant influence in the stoichiometry, 
then is concluded that this edge does not match the process evo- 
lution and, therefore, the edge is erased. A similar process is 
applied for all the remaining edges. 

Fig. 12 illustrates the resulting Bayesian structure. 


4.5. Conditional probability estimation 


The probabilities in Bayesian networks are represented by CP 
objects (CP = conditional probability), which define the proba- 
bility distribution of a node given its parents. When all nodes 
contain discrete values, a CP object can be described as a table. 

Table 2 presents the CP obtained by the maximum posteriori 
likelihood algorithm [15] on the network structure considered 
in Fig. 12. Note that the probabilities of nodes 1—4, correspond 
to prior probabilities (i.e. nodes 1—4 do not have parents), and 


Table 2 


Conditional probabilities of the Bayesian network (F = false, T = true) 


node J, 
F: 0.7465 
T: 0.2535 


node rF 
F: 0.7490 
T: 0.2510 


node Flow 
F F: 1.0000 0.0000 
T F: 0.0000 1.0000 
F T: 0.6829 0.3171 
T T: 0.0000 0.0000 


node À 
F: 0.9697 0.0303 
T: 0.2469 0.7531 


node Drying 
F: 0.9794 0.0206 
T: 0.2777 0.7223 


node Overld 
F: 0.7728 0.2272 
T: 0.7607 0.2393 


node Ipc 
F F F: 0.6550 0.3450 
T F F: 0.4946 0.5054 
F T F: 0.9000 0.1000 
T T F: 0.0000 0.0000 
F F T: 0.9689 0.0311 
T F T: 0.0582 0.9418 
F T T: 0.9298 0.0702 
T T T: 0.0000 1.0000 


node Power 
F: 0.9872 0.0128 
T: 0.7565 0.2435 


node aF 
F: 0.7572 
T: 0.2428 


node H2 
F: 0.7473 
T: 0.2527 


node Qgen 
F F: 0.8029 0.1971 
T F: 0.0681 0.9319 
F T: 0.8123 0.1877 
T T: 0.0000 0.0000 


node Flood 
F: 0.9119 0.0881 
T: 0.0014 0.9986 


node HRout 
F F: 0.8467 0.1533 
T F: 0.9000 0.1000 
F T: 0.0000 1.0000 
T T: 0.0000 0.0000 
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Table 2 (Continued ) 


node Volt 
F: 0.9637 0.0363 
T: 0.8113 0.1887 


node T 
F: 1.0000 0.0000 
T: 0.7604 0.2396 


node pH» 
F: 1.0000 0.0000 
T: 0.0000 1.0000 


the probabilities of nodes 5, 6, . . ., 16 correspond to conditional 
probabilities. 

The Bayesian network B composed of the network structure 
G plus conditional probabilities CP is ready to be used for fault 
diagnosis in a PEMFC. An inference is the computation of a 
conditional probability p(X,|Xz), where X4 is the variable of 
interest (e.g. the most probable fault cause) and Xz is the vari- 
able, or set of variables that have been observed (i.e. the effects 
observed by sensors). 

There are many different algorithms for calculating the infer- 
ence in Bayesian networks, which apply different tradeoffs 
between speed, complexity, generality, and accuracy [15]. The 
variable elimination algorithm permits the inference calculation 
on a Bayesian network with a generic structure. The JavaBayes 
System [8] implements this algorithm on a graphic interface. 
Figs. 13 and 14 illustrate the utilization of this program for the 
inference calculation for fault diagnosis in PEMFCs. Fig. 13 
depicts the graphical representation of the Bayesian structure. 
In this case, electrical current pc and temperature T are the evi- 
dence observed (i.e. Ipc = 1 and T= 1 indicate a type of abnormal 
situation). In Fig. 14, the conditional probabilities have been cal- 
culated for all fault causes (Jp, aF, rF and H2). In this case, when 
Ipc = 1 and T= 1 the most probable fault cause is aF (reduction in 
air flow) with 74% probability. The causes rF and Jn have inter- 
mediary probabilities, 39% and 34%, respectively. And cause 
H3 has the least probability, 4%. 

Several tests have been conducted to verify the effectiveness 
of the diagnosis; in all tests performed, the diagnosis always 
indicated the true cause as the most probable one [16]. 


causes 


Fig. 12. Network structure for fault diagnosis in a PEMFC. 


= JavaBayes Editor 


Edit Variable Edit Function Edit Network 


Fig. 13. Bayesian structure in the JavaBayes system. 
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Posterior distribution: 

probability ( "Jn" ) { 4/1 variable(s) and 2 values 
table 

0.34598767 46073625 

0.65401 23253926375; 


Ji p(true | evidence ) 
ii pfalse | evidence ); 


Posterior distribution: 
probability ( “aF") { #1 variable(s) and 2 values 
table 
0.7475274420645679 
0.2524725579354322; 


Ji p(true | evidence ) 
Ji p(false | evidence ); 


Posterior distribution: 
probability ( "rF" ) { 41 variable(s) and 2 values 
table 


0.38998659582768963 Jf p(true | evidence ) 


0.6100134041723103; Jf p(false | evidence ); 


} 
Posterior distribution: 
probability ( "H2" ) { #1 variable(s) and 2 values 
table 
0.04563026894053219  p(true | evidence ) 
0.9543697310594678; Jf p(false | evidence ); 


Fig. 14. Inference calculation in the JavaBayes system. 


Network structures representing a diagnostic process play 
a fundamental role for fault tolerant machines since they can 
be associated with fault treatment processes (i.e. performing the 
fault diagnosis to identify the fault cause and executing the auto- 
matic recovery process). In [18] and [17] the fault detection 
and fault treatment by automatic recovery processes in electric 
autonomous guided vehicles (AGV) and machining processes 
have been analyzed. 


5. Conclusion 


The construction of a network structure for fault diagnosis in 
proton exchange membrane fuel cells (PEMFC) was executed 
implementing probabilistic approaches. 

Fault records of some variables were constructed including 
variables difficult to monitor on a real machine. The record of all 
relevant variables is essential for the construction of the network 
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structure avoiding hidden variables, especially on intermediary 
layers. 

For the construction of a network structure, the sole imple- 
mentation of probabilistic approaches (such as the K2 and 
MCMC algorithms), is not enough for the construction of a 
“good” network, as presented in Figs. 10 and 11. An understand- 
ing of the process (e.g. processes in PEMFCs), is recommended, 
particularly for applying constrain-based conditions and knowl- 
edge to improve the network structure. 

For the diagnostic process (i.e. the inference calculation), 
the evidence was based on observations of variables that can 
be easily monitored by sensors like voltmeters, ammeters, ther- 
mocouples, etc. This allows an easy implementation of fault 
diagnostic processes in FC systems. 

The tests have shown agreement between the inference results 
and the original fault causes. They will allow the implementation 
of an on-line supervisor for fault diagnosis applying Bayesian 
networks constructed as described in this research. 

Topics such as the study of fault effects in FCs, the construc- 
tion of network structures for fault diagnosis in FCs, and their 
association to fault treatment processes are still under study, and 
are still open to research contributions. 
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